-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training: Add Fine-Tune API Docs #3718
Training: Add Fine-Tune API Docs #3718
Conversation
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
079a402
to
61a8f20
Compare
Signed-off-by: Andrey Velichkevich <[email protected]>
I added content from the Google doc and one tutorial. |
/hold for review |
Signed-off-by: Andrey Velichkevich <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Andrey Velichkevich <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@@ -0,0 +1,172 @@ | |||
+++ | |||
title = "How to Fine-Tune LLM with Kubeflow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title = "How to Fine-Tune LLM with Kubeflow" | |
title = "How to Fine-Tune LLMs with Kubeflow" |
|
||
[Training Operator Python SDK](/docs/components/training/installation/#installing-training-python-sdk) | ||
implements a [`train` Python API](https://github.com/kubeflow/training-operator/blob/6ce4d57d699a76c3d043917bd0902c931f14080f/sdk/python/kubeflow/training/api/training_client.py#L112) | ||
that simplify ability to fine-tune LLMs with distributed PyTorchJob workers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that simplify ability to fine-tune LLMs with distributed PyTorchJob workers. | |
that simplifies the ability to fine-tune LLMs with distributed PyTorchJob workers. |
implements a [`train` Python API](https://github.com/kubeflow/training-operator/blob/6ce4d57d699a76c3d043917bd0902c931f14080f/sdk/python/kubeflow/training/api/training_client.py#L112) | ||
that simplify ability to fine-tune LLMs with distributed PyTorchJob workers. | ||
|
||
You need to provide the following parameters to use `train` API: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to provide the following parameters to use `train` API: | |
You need to provide the following parameters to use the `train` API: |
) | ||
``` | ||
|
||
After you execute `train` API, Training Operator will orchestrate appropriate PyTorchJob resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you execute `train` API, Training Operator will orchestrate appropriate PyTorchJob resources | |
After you execute `train`, Training Operator will orchestrate appropriate PyTorchJob resources |
For example, you can use `train` API as follows to fine-tune BERT model using Yelp Review dataset | ||
from HuggingFace Hub: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I copy paste this snippet into a notebook, does it run seamlessly? What are the required dependencies? Do we need to provide a pip install
command to make sure that this snippet runs? Also, what is the expected output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add the prerequisites to run this API.
After you execute `train` API, Training Operator will orchestrate appropriate PyTorchJob resources | ||
to fine-tune LLM. | ||
|
||
## Architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go to a "Reference"
You can implement your own trainer for other ML use-cases such as image classification, | ||
voice recognition, etc. | ||
|
||
## User Value for this Feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just fold this under Why Training Operator Fine-Tune API Matter ?
by stripping the title User Value for this Feature
image classification, or another ML domain, fine-tuning can drastically improve performance and | ||
applicability of pre-existing models to new datasets and problems. | ||
|
||
## Why Training Operator Fine-Tune API Matter ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is out of place here. The how-to guide provides a step by step sequenced guide on how to achieve a very specific task. A how-to guide generally does not provide Reference or Explanation. It seems to me we are writing some paragraphs that would be more suited to an "Explanation" section. This is the fourth content types proposed by Diataxis - see here https://diataxis.fr/explanation/
I can very well see a page under "Explanation" titled "LLM Fine-Tune APIs in Kubeflow" where we discuss why we need it and how it fits into the ecosystem. Basically what you wrote already, plus a little bit of refactoring. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, but how user will map one guide to another ?
E.g. how user will quickly understand which explanation relates to which user guide looking at the website content ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good question. In the how to guide, we can have something like: "If you want to learn more about how the fine tune API fit in the Kubeflow ecosystem, head to <...>".
And in the exlanation guide, we can say something like: "Head to for a quick start tutorial on using LLM Fine-tune APIs. Head to for a reference architecture on the control plane implementation"
And generally we can have links to how-tos in tutorials and reference guides. So in general, let's try to link related topics together when it makes sense for a user to follow that train of thought
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, what do you think about it @StefanoFioravanzo ?
7d30f12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Signed-off-by: Andrey Velichkevich <[email protected]>
I addressed your comments @StefanoFioravanzo.
How we can show the expected output ? Our LLM trainer doesn't support any output yet: https://github.com/kubeflow/training-operator/blob/master/sdk/python/kubeflow/trainer/hf_llm_training.py#L178, so we need to work in the future to understand how user should consume the fine-tuned model. |
Issue + KF 1.10 tag? :) |
Signed-off-by: Andrey Velichkevich <[email protected]>
@StefanoFioravanzo I believe, I addressed all of your comments. Does it look good to you ? |
@andreyvelich yes it does thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awsome documentation! Thank you!
/lgtm
@andreyvelich the links in fine-tuning.md are giving 404 page not found. Am I missing something? |
@deepanker13 Did you check these links via Website preview: https://deploy-preview-3718--competent-brattain-de2d6d.netlify.app/ ? |
@andreyvelich it's working with the preview. Thanks for the awesome documentation! |
@andreyvelich shall we merge this one? |
Sure, let's merge it. Thanks everyone for review! |
Related: kubeflow/training-operator#2013
This is draft PR for our new Fine-Tune API in Kubeflow Training Operator.
We will work on the page structure in this Google doc to finalise it: https://docs.google.com/document/d/18PuuaDRISj5mlrBn1GJrxwuB6Z5zTtXKpVbLUIeLx-8/edit?usp=sharing.